All Databases MacTech Vol 04-1988

QuickTrap

Volume Number: 4

Issue Number: 3

Column Tag: The Mac Hacker

QuickTrap Routines Bypass Trap Dispatcher

By Mike Morton, University of Hawaii

Bypassing the ROM trap dispatcher

In an article a while back, I covered the basics of bypassing the Macintosh trap

dispatcher to call ROM routines directly, to speed up calls to the Toolbox and OS. In

this article, I’ll present a set of subroutines which implement this technique in a

practical way.

The package is written in MPW assembler, and should be easily callable from any

of the MPW languages. It’s short and should be portable to other development systems.

It also includes a “fail-soft” feature, in case it turns out not to work on some future

Macintosh.

A quick review

Programs call the Macintosh Toolbox and Operating System routines by executing

“illegal” instructions, which are handed to the trap dispatching code in the ROM. In

addition to the time it takes for the 680x0 processor to recover from the emotional

trauma of this illegal instruction, the dispatcher must fetch the offending instruction,

decode it, and call the routine it specifies. This is very general, since it “hides” ROM

locations from the application, but it’s also slow.

With the GetTrapAddress routine, you can calculate the address of a ROM routine

just once each time your application runs. Calling that address directly can save you a

lot of time, with very little cost in generality.

What does the dispatcher do?

Here’s the code for the dispatcher in my MacPlus ROM. Your Mac may have

something a little different, but all existing Macs seem to be similar in principle. The

dispatcher, at address $401F52 in my ROM, disassembles to:

disp:

SUBQ.L #2, SP ; add 2 bytes above CCR

MOVEM.L D1-D2/A2, -(SP) ; save 12 bytes of regs

MOVE.L 12+4(SP), A2 ; get PC of trap word

MOVE.W (A2)+, D2 ; get A-trap word

MOVE.L A2, 12+4(SP) ; restore updated PC

MOVE.W D2, D1 ; copy trap word to D1

ANDI.W #$01FF, D2 ; get just trap number

CMPI.W #$A800, D1 ; trap or OS?

BLO.S doOS ; jump if OS

LEA $0C00, A2 ; point->Toolbox dispatch

LSL.W #2, D2 ; scale number->longwords

MOVE.L (A2,D2.W), 12(SP) ; copy address to stack

CMPI.W #$AC00, D1 ; “auto-pop” bit set?

MOVEM.L (SP)+, D1-D2/A2 ; restore regs; leave CCR

BLO.S callTB ; skip if “auto-pop” off

MOVE.L (SP)+, (SP) ; RTS to caller, not glue

tBox: RTS ; “call” Toolbox routine

doOS:

LEA $0400, A2 ; point to OS dispatch

BCLR #8, D2 ; clear&test “keep A0” bit

BNE.S OSa0 ; skip to allow A0 returned

LSL.W #2, D2 ; scale number->longwords

MOVE.L (A2,D2.W), A2 ; fetch OS routine address

MOVEM.L A0-A1, -(SP) ; save regs (incl A0)

JSR (A2) ; call OS routine

MOVEM.L (SP)+, A0-A1 ; and restore OS regs

OSrt:

MOVEM.L (SP)+, D1-D2/A2; restore OUR regs

ADDQ.W #4, SP ; ignore stacked CCR

TST.W D0 ; preset CCR on result

RTS ; and return

OSa0:

LSL.W #2, D2 ; scale number->longwords

MOVE.L (A2,D2.W), A2 ; fetch OS routine address

MOVE.L A1, -(SP) ; preserve A1, *not* A0

JSR (A2) ; call OS routine

MOVE.L (SP)+, A1 ; and restore A1

BRA.S OSrt ; clean up with common code

[An aside: This is the first piece of ROM code I ever read, and I still think it’s a

great example of tight 68000 coding. It’s tighter on the Mac II, with indirect

addressing available. I can’t see any way to make it faster; can anyone spot a way to

save a few bytes, though?]

Besides figuring out which routine to call (using the Toolbox dispatch table at

$0C00 or OS table at $0400), the dispatcher also does some other important things.

For Toolbox traps, it discards the return address if the “auto-pop” bit is set -- this

is useful for “glue”. And for OS traps, it preserves D1, D2, A1 and A2, and sometimes

A0. For OS traps, it also passes the low nine bits of the trap number to the routine, in

D1,

Our task is to make a trap “dispatcher” which does all this, but is much faster.

Note, for instance, that the new code must still pass the trap number in D1.w -- I

believe this is how some routines test for flag bits set in the word. (For instance,

CmpString has a bit to specify if the comparison is case-sensitive.)

Hey, wait a minute! Isn’t it a bad idea to know how one ROM routine (the

dispatcher) communicates with all the others? Isn’t code which depends on this

interface likely to fall apart when the Mac III hits the streets? Well, first of all, it’d

be awfully hard for Apple to change hundreds of routines. But more importantly,

there’s a way to back out gracefully. Trust me; we’ll get to it

An application’s view of the QuickTrap routines

The fundamental speedup is to get rid of the dispatcher, and have one “quick

trap” routine for every real routine you’d like fast access to. For instance, if your

program does a lot of SetPort calls, you can easily create “qtSetPort”, which has

exactly the same interface and does the same thing, only faster. As you might guess,

each qtxxx routine caches the address for its routine.

Once, at the beginning of your application, you must call qtEval, which

“evaluates” each address and stores it. If you don’t call it, everything will still work

-- this is related to the fail-soft scheme.

Other than this, everything works the same as old-style trap routines.

Caching problems

Imagine that you spend a lot of time doing FrameOval calls to draw circles on the

screen, and would like to speed this up. (Actually, I’m sure the trap time is

insignificant compared to the drawing time; this is just an example.) You install

“qtFrameOval” and call it instead everything works great.

Now your friend gives you this neat, public-domain desk accessory which causes

all ovals to be drawn on your screen with smile-faces in them. [Any takers to write

this, by the way? You could call it The Smiling Moose] It does this by altering the

FrameOval trap to call it. But since your application never executes that trap, its

ovals are drawn unmolested. How can you make sure your ovals are happy?

The answer is to call qtEval at the right times -- not just at initialization but

whenever you suspect someone has installed a replacement trap routine. Since the

qtxxx routines are supposed to “cache” the real addresses, they must track new

address when they’re installed, or the cache becomes “stale”.

One way to do this is to call qtEval every time you regain control from a desk

accessory, each time you regain control from Switcher or Multi finder, and each time

you invoke an FKEY. Perhaps you’d also have to call it for every SystemTask call. And

of course you must call it if your application does any SetTrapAddress calls for the

relevant traps. In short, whenever anyone could have changed trap addresses, refresh

the cache.

A simpler approach is to change the SetTrapAddress trap by installing a prefix

routine which sets a flag in your globals that re-evaluation is needed. If DAs, FKEYs,

etc., play by the rules and use SetTrapAddress calls, nobody can make the trap tables

get out of sync with your cached addresses.

It’s tempting to call qtEval in your idle-loop as a heavy-handed way to make sure

it’s done often enough. I suspect this is a bad idea -- it can cause seemingly random

bugs.

One other way: if you use, for instance, qtFrameOval only in some code which

doesn’t relinquish control, call qtEval once before each time you enter that code.

Remember that qtEval isn’t all that speedy -- it must call GetTrapAddress for every

qtxxx routine.

Reasons not to use these routines

Because the routines are JSR’d to, they take up four bytes instead of two. This is

no big problem for most applications, but don’t change all your calls.

When you’re debugging, commands to break on traps don’t work, since your

application is not executing trap instructions. You can force these traps to occur by

disabling the caching; see below for details.

The routines use impure code. You must make sure you put them in a segment

which is locked in memory.

Which traps should you replace?

Remember that many traps take so much time that the dispatch isn’t worth

improving. Others do next to nothing, and speed up a lot. In early use of these routines

at Lotus, we estimated about thirty routines were worth replacing. In the OS world,

things like BlockMove and UprString were included. Routines which just twiddled

handles are also important, like HLock, HUnlock, HPurge, HNoPurge, and

GetHandleSize. Among the Toolbox routines, things like MoveTo and SetPort seemed to

help.

Even if a routine is slow, it may be worth tweaking if it’s called a lot. We got

measureable improvements substituting for CharWidth, DrawString, StringWidth, and

SystemTask.

You can also replace package calls, which is kind of a pain. If you want to change

all the FP68K traps to qtFP68K, you have to change Apple’s include files, since each of

the SANE macros invokes the trap. Another solution is to just redefine FP68K to be a

macro to JSR to the qtxxx routine. But then you have to define a trap like “myFP68K”

which still expands to the A-line trap -- this is because the qtxxx routine must have a

copy of the trap word.

How much does it help?

As the TV diet ads say, results vary directly with how closely you stick to the

plan. Average performance in a large Macintosh product at Lotus was improved by

about 5%. A couple of heavily CPU-bound loops were improved by 15%. These aren’t

huge gains, but considering that they took only a day or so of work to install in a very

large program, they’re pretty good.

When does the warranty run out?

OK, it’s time to face the music. If these routines dive directly into the ROM, they

may someday dive into ROM routines in a new machine which expect different

parameters. (For more on this topic, see Macintosh Technical Note #110.) Or even if

the ROM doesn’t change, some caching problem may come up if your application’s

users use some odd way of altering trap addresses and making your cache stale.

The initialization routine qtEval can be easily disabled by modifying resources.

For instance, when a user calls to complain that some FKEY or DA doesn’t work with

your application, you can quickly change a copy of the application to disable address

caching and test if that’s the problem. If it is the problem, you can either distribute

the altered application or tell power users how to edit the resources to alter the copy

they already have.

The resource used to control caching is QTRP 257. The format is simple: if the

resource is present and the first word is zero, caching is enabled. To turn off caching,

just remove this resource under Resedit (or renumber it, to easily restore caching).

Referenced by (2):